ClinicalTrials.gov was first released in 2000. As of March 2019, ClinicalTrials.gov includes 300,676 research studies in all 50 states and in 208 countries. The the CTTI AACT project and database provides a harmonizing schema and convenient access. However, there remain major challenges to knowledge discovery using these data, such as lack of standard terminology. To address this, for the use case of elucidating drug target hypotheses, we have used state of the art domain specialized text mining with synonym resolution for specific classes of entities: (1) chemicals and (2) diseases. Chemicals are identified and resolved using NextMove Leadmine. Diseases, indications and other phenotypic terms are mined via JensenLab Tagger with Disease Ontology dictionary, and NLM supplied MeSH terms. Protein targets are associated via ChEMBL bioactivities on molecular structure cross-referencing. Another fundamental challenge is to assess confidence of inferences from noisy and disparate data. We propose a scoring system for assessing confidence for target hypotheses inferred from aggregated clinical trials, with emphasis on higher confidence, novel predictions with the potential to illuminate the understudied druggable genome. (Paragraph excerpted from manuscript draft.)
NCT_ID→(JensenLab:Tagger)→DOID
NCT_ID→(AACT)→MeSH
NCT_ID→(NextMove:LeadMine)→SMILES
SMILES→(PubChem)→CID
CID→(PubChem)→INCHIKEY
INCHIKEY→(ChEMBL)→MOLECULE_CHEMBL_ID
MOLECULE_CHEMBL_ID→(ChEMBL)→ACTIVITY_ID
ACTIVITY_ID→(ChEMBL)→TARGET_CHEMBL_ID
TARGET_CHEMBL_ID→(ChEMBL)→COMPONENT_ID
COMPONENT_ID→(ChEMBL)→UNIPROT
ACTIVITY_ID→(ChEMBL)→DOCUMENT_CHEMBL_ID
DOCUMENT_CHEMBL_ID→(ChEMBL)→PUBMED_ID
aact_studies.tsvaact_drugs.tsvaact_descriptions.tsvaact_drugs_leadmine.tsvaact_drugs_smi_pubchem_cid.tsvaact_drugs_smi_pubchem_cid2ink.tsvaact_drugs_ink2chembl.tsvaact_drugs_chembl_activity.tsvaact_drugs_chembl_target_component.tsvaact_drugs_chembl_document.tsvpharos_targets.tsvaact_descriptions_tagger_disease_matches.tsvdiseases_entities.tsv
nct_idis the study ID.
## [1] "Mon Dec 2 09:57:13 2019"
library(readr)
library(data.table)
library(tm)
library(stringr)
library(plotly, quietly=T)
## Warning: As of rlang 0.4.0, dplyr must be at least version 0.8.0.
## ✖ dplyr 0.7.8 is too old for rlang 0.4.1.
## ℹ Please update dplyr with `install.packages("dplyr")`.
Read file of all studies in AACT.
## [1] "Total studies: 300214 ; unique NCT_IDs: 300214"
Reference type results_reference may offer greater evidence, confidence.
## [1] "references: 388031; NCT_IDs: 61208; PMIDs: 287758; results_references: 64880"
Read file of all drugs in AACT.
id is AACT INTERVENTION_ID, corresponding with an instance of a drug, dose, delivery, etc. in a study.## [1] "Unique drug names: 91347 ; unique intervention IDs: 255077"
Select only Interventional studies (study_type) associated with drugs (via NCT_ID).
## [1] "Interventional studies: 237892 (79.2%)"
## [1] "Interventional drug studies: 124421 ; unique NCT_IDs: 124421"
| phase | N_studies | N_drugs |
|---|---|---|
| Early Phase 1 | 1574 | 2615 |
| Phase 1 | 23603 | 48593 |
| Phase 1/Phase 2 | 6663 | 13288 |
| Phase 2 | 33910 | 68850 |
| Phase 2/Phase 3 | 3305 | 6503 |
| Phase 3 | 22988 | 49507 |
| Phase 4 | 19593 | 36331 |
| NA | 12785 | 29390 |
| overall_status | N_studies | N_drugs |
|---|---|---|
| Active, not recruiting | 6420 | 13962 |
| Completed | 72053 | 145006 |
| Enrolling by invitation | 638 | 1060 |
| Not yet recruiting | 4138 | 8001 |
| Recruiting | 16723 | 33973 |
| Suspended | 463 | 945 |
| Terminated | 10138 | 19618 |
| Unknown status | 10106 | 18463 |
| Withdrawn | 3742 | 6969 |
## Warning: Ignoring 1 observations
## Warning: Ignoring 1 observations
AACT drug names resolved to standard names and structures via SMILES. Note that one name may include multiple chemicals. Now we can use cheminformatically rigorous counts for drugs as active pharmaceutical ingredients (APIs).
## [1] "Drug unique SMILES resolved by LeadMine: 4699 ; unique intervention IDs: 171741"
| smi2img | N_mentions | names |
|---|---|---|
| 2637 | Abraxane; PACLITAXEL; Paclitaxel; Taxol; abraxane; paclitaxel; taxol | |
| 2545 | CYCLOPHOSPHAMIDE; Ciclophosphamide; Cyclophosphamid; Cyclophosphamide; ciclophosphamide; cyclophosphamide | |
| 2461 | CISPLATIN; Cis-platinum; Cisplatin; Cisplatine; Cisplatinum; cis Platinum; cis-platinum; cisplatin; cisplatine; cisplatinum | |
| 2070 | DEXAMETHASONE; Dexamethason; Dexamethasone; Dexamethosone; Maxitrol; OZURDEX; Oradexon; Ozurdex; dexamethason; dexamethasone; dexamethosone | |
| 2054 | CARBOPLATIN; Carboplatin; Carboplatine; Paraplatin; carboplatin; carboplatine | |
| 1779 | DOCETAXEL; Docetaxel; docetaxel | |
| 1625 | METFORMIN; MetFORMIN; Metformin; Metformine; metformin; metformine | |
| 1540 | GEMCITABINE; Gemcitabine; gemcitabine | |
| 1342 | CAPECITABINE; Capecitabin; Capecitabine; XELODA; Xeloda; capecitabine; xeloda | |
| 1178 | Cortancyl; Lodotra; Meticorten; Prednison; Prednisone; RAYOS; prednison; prednisone | |
| 1157 | 0xaliplatin; Eloxatin; OXALIPLATIN; OXAliplatin; Oxaliplatin; Oxaliplatine; eloxatin; oxaliplatin; oxaliplatine | |
| 1157 | METHOTREXATE; Methotrexate; Metoject; methotrexate | |
| 1086 | BUPIVACAINE; Bupivacain; Bupivacaine; EXPAREL; Exparel; SKY0402; bupivacain; bupivacaine | |
| 1044 | ETOPOSIDE; Etoposid; Etoposide; etoposide | |
| 1027 | ADOPORT; ADVAGRAF; Adoport; Advagraf; ENVARSUS; Envarsus; FK-506; FK506; PROGRAF; Prograf; Protopic; TACROLIMUS; Tacrolimus; tacrolimus | |
| 978 | NORMAL SALINE; Normal Saline; Normal saline; normal salin; normal saline | |
| 977 | LIDOCAINE; LMX 4; LMX4; Lidocain; Lidocaine; Lidoderm; Lignocain; Lignocaine; Oraqix; lidocain; lidocaine; lignocaine | |
| 908 | CYTARABINE; Cytarabine; Cytosar; DepoCyt; DepoCyte; Depocyt; Depocyte; cytarabine; cytosar | |
| 903 | COPEGUS; Copegus; REBETOL; RIBAVIRIN; Rebetol; Ribasphere; Ribavarin; Ribavirin; Ribavirine; Virazole; rebetol; ribavarin; ribavirin | |
| 846 | Diprivan; PROPOFOL; Propofol; propofol |
## [1] "Drugs (drug names) with resolved structure: 180555 / 197300 (91.5%)"
## [1] "Mentions by intervention ID: 157862 / 171741 (91.9%)"
## [1] "Mentions by study: 92966 / 99647 (93.3%)"
## [1] "Mentions by drug name: 11108 / 58297 (19.1%)"
## [1] "PubChem SMILES2CID hits: 3933 / 4540 (86.6%)"
## [1] "Intervention IDs mapped to PubChem CIDs (via SMILES): 153342"
## [1] "PubChem CIDs with InChIKeys: 3783"
For Target Development Level (TDL) and other metadata.
Perhaps should instead use PubChem CIDs and UniChem.
## [1] "ChEMBL compounds mapped via InChIKeys: 3316"
Select only activities with pChembl values for relevance to protein targets and confidence.
## [1] "ChEMBL activities: 127943"
## [1] "ChEMBL activities molecules: 2302 ; canonical_smiles: 2302 ; targets: 3877 ; documents: 16959"
| assay_type | N_molecule | N_activity |
|---|---|---|
| F:Functional | 1828 | 73811 |
| B:Binding | 1831 | 49891 |
| A:ADMET | 759 | 4058 |
| P:Physicochemical | 44 | 120 |
| T:Toxicity | 28 | 59 |
| U:Unclassified | 3 | 4 |
## [1] "ChEMBL target proteins: 3157"
## [1] "ChEMBL target proteins mapped to TCRD (human): 1805"
## [1] "Organisms: 187"
| organism | N_targets | Types |
|---|---|---|
| Homo sapiens | 1806 | CHIMERIC PROTEIN; PROTEIN COMPLEX; PROTEIN COMPLEX GROUP; PROTEIN FAMILY; PROTEIN-PROTEIN INTERACTION; SELECTIVITY GROUP; SINGLE PROTEIN |
| Rattus norvegicus | 529 | PROTEIN COMPLEX; PROTEIN COMPLEX GROUP; PROTEIN FAMILY; SELECTIVITY GROUP; SINGLE PROTEIN |
| Mus musculus | 238 | CHIMERIC PROTEIN; PROTEIN COMPLEX; PROTEIN COMPLEX GROUP; PROTEIN FAMILY; SINGLE PROTEIN |
| Bos taurus | 98 | PROTEIN COMPLEX; PROTEIN COMPLEX GROUP; PROTEIN FAMILY; SINGLE PROTEIN |
| Sus scrofa | 36 | PROTEIN COMPLEX; PROTEIN FAMILY; SINGLE PROTEIN |
| Cavia porcellus | 26 | SINGLE PROTEIN |
| Escherichia coli K-12 | 19 | PROTEIN COMPLEX; PROTEIN FAMILY; SINGLE PROTEIN |
| Oryctolagus cuniculus | 18 | SINGLE PROTEIN |
| Escherichia coli | 17 | PROTEIN COMPLEX; SINGLE PROTEIN |
| Mycobacterium tuberculosis | 17 | SINGLE PROTEIN |
## [1] "Human targets: 1806"
| idgFamily | N |
|---|---|
| Kinase | 405 |
| Enzyme | 330 |
| GPCR | 158 |
| None | 120 |
| IC | 64 |
| Transporter | 53 |
| Epigenetic | 35 |
| NR | 28 |
| TF | 20 |
| TF; Epigenetic | 3 |
## [1] "Human single-protein targets: 1216 ; unique UniProts: 1216"
## [1] " Tchem: 767" " Tclin: 342" " Tbio: 105"
## [4] " Tdark: 2"
With JensenLab DOID entities dictionary. On descriptions from detailed_descriptions table.
serialno corresponds with DOID.id is AACT primary key.Likely false positives, manually removed:
## [1] "Total disease mentions: 497207 (in 124421 studies)"
| doid | N_mentions | terms |
|---|---|---|
| DOID:162 | 28596 | CANCER; CANcer; Cancer; Malignant Tumor; Malignant neoplasm; Malignant tumor; Primary Cancer; Primary cancer; cancer; malignant Tumor; malignant neoplasm; malignant tumor; primary cancer |
| DOID:9351 | 17274 | DIABETES; DIABETES MELLITUS; DIAbetes; DIabetes; Diabetes; Diabetes Mellitus; Diabetes mellitus; diabetes; diabetes Mellitus; diabetes mellitus; diabetes-mellitus |
| DOID:6713 | 16632 | CVA; Cerebrovascular Accident; Cerebrovascular Disease; Cerebrovascular accident; Cerebrovascular disease; STROKE; STRokE; Stroke; cerebro- vascular disease; cerebro-vascular disease; cerebrovascul… |
| DOID:2030 | 12084 | ANXIETY; Anxiety; Anxiety Disorder; Anxiety state; anxiety; anxiety disorder; anxiety state; anxiety syndrome; anxiety-state |
| DOID:1612 | 10583 | BREAST CANCER; BReast CAncer; BReast Cancer; Breast Cancer; Breast cancer; Breast tumor; Breast-cancer; Primary breast cancer; breast Cancer; breast caNcEr; breast cancer; breast tumor; breast-canc… |
| DOID:2841 | 10021 | ASTHMA; Asthma; BHR; Bronchial hyper-reactivity; Bronchial hyperreactivity; EIA; Exercise-induced asthma; asthma; bronchial hyper reactivity; bronchial hyper-reactivity; bronchial hyperreactivity; … |
| DOID:3083 | 9782 | CHRONIC OBSTRUCTIVE PULMONARY DISEASE; COLD; COPD; COPd; Chronic Obstructive Lung Disease; Chronic Obstructive Lung disease; Chronic Obstructive Pulmonary Disease; Chronic Obstructive Pulmonary dis… |
| DOID:9970 | 9303 | OBESITY; OBesity; Obesity; obEsity; obe-sity; obesity |
| DOID:10763 | 9144 | HBP; HTN; HYPERTENSION; High Blood Pressure; High blood pressure; High-blood pressure; Hypertension; Hypertensive disease; high blood Pressure; high blood pressure; high blood-pressure; htn; hyper-… |
| DOID:3393 | 6816 | C-HD; CAD; CHD; CORONARY ARTERY DISEASE; CORONARY SYNDROME; CORONARY syndrome; ChD; Coronary ARtery DIsease; Coronary Artery Disease; Coronary Disease; Coronary Heart Disease; Coronary Heart diseas… |
| DOID:0060145 | 6115 | ANALGESIA; Analgesia; analgeSia; analgesia |
| DOID:9352 | 5848 | Diabetes Mellitus Type 2; Diabetes Mellitus Type II; Diabetes Mellitus type 2; Diabetes Mellitus, Type II; Diabetes mellitus Type 2; Diabetes mellitus non-insulin-dependent; Diabetes mellitus type … |
| DOID:10283 | 5056 | Familial Prostate Cancer; HPC; PRostate Cancer; Prostate CAncer; Prostate Cancer; Prostate cancer; Prostatic cancer; hereditary prostate cancer; prostate Cancer; prostate cancer; prostate-cancer; p… |
| DOID:8469 | 4985 | FLU; Flu; Influenza; flu; influenza |
| DOID:225 | 4962 | SYNDROME; Syndrome; syn drome; syndrome |
| DOID:3908 | 4959 | NSCLC; Non Small Cell Lung Cancer; Non Small Cell Lung Carcinoma; Non Small Cell Lung cancer; Non small cell lung cancer; Non small-cell lung cancer; Non- small cell lung cancer; Non-Small Cell Lun… |
| DOID:784 | 4841 | CKD; CKF; CRD; CRF; Chronic Kidney Disease; Chronic Kidney disease; Chronic Kidney failure; Chronic Renal Disease; Chronic kidney disease; Chronic kidney failure; Chronic renal disease; chronic Kid… |
| DOID:5419 | 4689 | SCHIZOPHRENIA; Schizophrenia; schizophrenia |
| DOID:684 | 3836 | HCC; HEPATOCELLULAR CARCINOMA; Hepatocellular Carcinoma; Hepatocellular carcinoma; Hepatoma; hcc; hepato-cellular carcinoma; hepatocellular Carcinoma; hepatocellular carcinoma; hepatoma |
| DOID:5844 | 3664 | Heart Attack; Heart attack; MYOCARDIAL INFARCTION; Myocardial Infarct; Myocardial Infarction; Myocardial infarct; Myocardial infarction; heart attack; myo-cardial infarction; myocardiaL infARction;… |
Sort synonyms terms by frequency.
| nct_id | doid | N_mentions | disease_terms |
|---|---|---|---|
| NCT00448669 | DOID:526 | 2 | HIV infection |
| NCT00635674 | DOID:0050848 | 11 | OSA;obstructive sleep apnea |
| NCT00635674 | DOID:0014667 | 6 | metabolic syndrome |
| NCT00635674 | DOID:1936 | 2 | atherosclerosis |
| NCT00635674 | DOID:10763 | 1 | hypertension |
| NCT00775606 | DOID:11405 | 1 | diphtheria |
| NCT00775606 | DOID:11338 | 1 | tetanus |
| NCT01118520 | DOID:7693 | 4 | AAA;abdominal aortic aneurysm |
| NCT01169051 | DOID:1324 | 2 | lung cancer |
| NCT01169051 | DOID:162 | 2 | cancer |
| NCT01169051 | DOID:0060224 | 1 | atrial fibrillation |
| NCT01169051 | DOID:0060145 | 1 | analgesia |
| NCT01169051 | DOID:5041 | 1 | esophageal cancer |
| NCT01169051 | DOID:1612 | 1 | breast cancer |
| NCT01285219 | DOID:1227 | 3 | neutropenia |
| NCT01673893 | DOID:5844 | 2 | Myocardial Infarction;myocardial infarct |
| NCT01693484 | DOID:9351 | 1 | diabetes |
| NCT01693484 | DOID:848 | 1 | arthritis |
| NCT01693484 | DOID:178 | 1 | vascular disease |
| NCT01927887 | DOID:3310 | 1 | allergic |
| NCT01927887 | DOID:2355 | 1 | anemia |
| NCT01927887 | DOID:1781 | 1 | thyroid cancer |
| NCT01927887 | DOID:1205 | 1 | allergy |
| NCT03300830 | DOID:0111157 | 5 | Castleman disease |
| NCT03300830 | DOID:162 | 4 | cancer |
| NCT03300830 | DOID:0111152 | 1 | Multicentric Castleman Disease |
| NCT03300830 | DOID:8632 | 1 | Kaposi sarcoma |
Many false positives due to synonomy collisions with common words (e.g. “Aim 1”, “Nut”).
## [1] "Total target mentions: 556908 (in 124421 studies)"
| ensp | N_mentions | terms |
|---|---|---|
| ENSP00000380432 | 19677 | I-NS; insulin; Insulin; ins; Ins; INSULIN; INS; 1 FU 2; 1HI T; inSUlin; INs; INsulin; 3 in C; InS; InsuLin |
| ENSP00000376823 | 14422 | MRI; MRi; MRI 2; mri; MRI2; MR I; MRI_2; MRI 2; MRI-2 |
| ENSP00000255030 | 4513 | C-Reactive protein; CRP; C-reactive protein; C reactive protein; C Reactive Protein; c reactive protein; c-reactive protein; C-Reactive Protein; CRp; crp; C-reactive Protein; CrP; C - reactive prot… |
| ENSP00000225474 | 4343 | filgrastim; granulocyte colony-stimulating factor; G-CSF; granulocyte-colony stimulating factor; Filgrastim; GCSF; granulocyte colony stimulating factor; Granulocyte-colony stimulating factor; Gran… |
| ENSP00000275493 | 4124 | EGFR; epidermal growth factor receptor; eGFR; HER1; Epidermal growth factor receptor; ERBB; Men A; HER-1; MenA; erythroblastic leukemia viral; Epidermal Growth Factor receptor; ERBB1; e-GFR; Epider… |
| ENSP00000478570 | 4016 | VEGF; vascular endothelial growth factor; Vascular Endothelial Growth Factor; Vascular endothelial growth factor; VEGF family; vascular-endothelial growth factor; 1 mkg; vegf; vascular endothelial … |
| ENSP00000398698 | 3661 | Tumor Necrosis Factor; Tumor necrosis factor; TNF; TNF alpha; tumor necrosis factor; TNFalpha; TNF-alpha; DIF; TNFa; Dif; TNFA; TNF-a; Tumor necrosis Factor; TNF-Alpha; dif; TNF Alpha; TNF-A; tumor… |
| ENSP00000011653 | 3611 | CD4; CD4-receptor; CD 4; 3 CD4; CD4 receptor; CD4 molecule; CD-4 |
| ENSP00000314151 | 3480 | PSA; prostate specific antigen; PsA; prostate-specific antigen; aPS; Prostate Specific Antigen; Prostate specific antigen; APs; ApS; Prostate specific Antigen; APS; Prostate-Specific Antigen; Prost… |
| ENSP00000452780 | 3459 | 1 of 2; 5 men; 1- Age; beta-2 microglobulin; 3 MRI; 3 to -2; 3 g iv; 4 pre; 5 mEq; 2 HLA; B2M; beta 2-microglobulin; beta2-microglobulin; 1 HLA; 3 Low; beta2 microglobulin; 3 low; 5 meq; 5mEq; 5 in… |
| ENSP00000327246 | 3240 | 1 of 2; 3 - HCV; VIPR; 3 HCV; 1-of-2; 1of 2 |
| ENSP00000226730 | 3152 | Interleukin-2; IL-2; aldesleukin; IL2; interleukin 2; interleukin-2; interleukin2; hIL2; Il-2; Aldesleukin; IL - 2; Interleukin 2; interleukin—2; ALDESLEUKIN; I L-2; T cell growth factor; lymphok… |
| ENSP00000313950 | 3060 | Aim 1; AIM 1; aim 1; Aim1; Aim 1; Aim-1; AURORA 1; aim1; AIM1; Aurora B; aim 1; Aim- 1 |
| ENSP00000296589 | 3051 | Aim 1; AIM 1; aim 1; Aim1; Aim 1; Aim-1; aim1; AIM1; SLC45A2; aim 1; Aim- 1 |
| ENSP00000358062 | 3047 | Aim 1; AIM 1; aim 1; Aim1; Aim 1; Aim-1; aim1; AIM1; aim 1; Aim- 1; ST4 |
| ENSP00000385675 | 2957 | interleukin -6; IL-6; IL6; Interleukin 6; interleukin-6; Interleukin-6; interleukin 6; CDF; HGF; Interleukin- 6; IL- 6; BSF-2; IL 6; Il-6; il-6; Interleukin - 6; IL—6; InterLeukin-6; interleukin-… |
| ENSP00000269571 | 2877 | HER2; human epidermal growth factor receptor 2; HER-2; ErbB2; Human Epidermal Growth Factor Receptor 2; Her2/Neu; human epidermal growth factor receptor-2; ERBB2; erb-b2 receptor tyrosine kinase 2;… |
| ENSP00000387662 | 2872 | GLP-1; glucagon; glucagon-like peptide-1; GLP1; Glucagon-like peptide-1; Glucagon; glucagon-like peptide 1; glucagon like peptide-1; glucagon-like-peptide-1; glucagon-like peptide 2; GLP2; glucagon… |
| ENSP00000295897 | 2851 | albumin; serum albumin; Albumin; Serum Albumin; Serum albumin; HSA; alb; ALB; Alb; hsa; ALbumin; 2b XL |
| ENSP00000357112 | 2757 | Aim 2; AIM 2; aim 2; Aim-2; AIM2; Aim2 |
Sort synonyms terms by frequency.
| nct_id | ensp | N_mentions | target_terms |
|---|---|---|---|
| NCT00238043 | ENSP00000252723 | 4 | Epoetin |
| NCT00702195 | ENSP00000410257 | 1 | IVF |
| NCT00731159 | ENSP00000410257 | 1 | IVF |
| NCT00960518 | ENSP00000309968 | 2 | TACE |
| NCT01168609 | ENSP00000333203 | 4 | PCI |
| NCT01168609 | ENSP00000216714 | 1 | apex |
| NCT01168609 | ENSP00000317780 | 1 | Cox |
| NCT01168609 | ENSP00000321260 | 1 | Cox |
| NCT01215994 | ENSP00000343656 | 3 | GFR |
| NCT01215994 | ENSP00000381448 | 2 | cystatin C |
| NCT02134977 | ENSP00000328236 | 1 | rod |
| NCT02134977 | ENSP00000356520 | 1 | RHa |
| NCT02134977 | ENSP00000405330 | 1 | estrogen receptor |
| NCT02383355 | ENSP00000263686 | 2 | CD62P;P-selectin |
| NCT02383355 | ENSP00000265316 | 1 | ABC |
| NCT02671604 | ENSP00000265970 | 2 | CPK |
| NCT02671604 | ENSP00000215882 | 1 | cTp |
| NCT02671604 | ENSP00000263100 | 1 | ABG |
| NCT02671604 | ENSP00000320117 | 1 | STP |
| NCT02671604 | ENSP00000348019 | 1 | AST |
| NCT02671604 | ENSP00000367038 | 1 | HES |
| NCT02671604 | ENSP00000378972 | 1 | STP |
| NCT03254264 | ENSP00000370546 | 4 | ASD |
And include references.
Since each study may be associated with multiple drugs, targets and diseases, we build a table of all associated combinations, then aggregate by study (NCT_ID). For DOIDs with multiple terms, keep only most common term for simplicity.
## [1] "study-disease links: 237415"
NCT_ID→(NextMove:LeadMine)→SMILES
SMILES→(PubChem)→CID
Keep only studies including both disease and drug mentions.
## [1] "study-drug-disease links: 154971"
## [1] "studies with drug-disease links: 32832"
ACTIVITY_ID→(ChEMBL)→TARGET_CHEMBL_ID
TARGET_CHEMBL_ID→(ChEMBL)→COMPONENT_ID
COMPONENT_ID→(ChEMBL)→UNIPROT
## [1] "ACTIVITY_IDs: 127943 ; TARGET_CHEMBL_IDs: 3877 ; pairs: 127943"
## [1] "COMPONENT_IDs: 2535 ; TARGET_CHEMBL_IDs: 2481 ; pairs: 3157"
## [1] "UNIPROTs: 2535 ; SINGLE_PROTEIN UNIPROTs: 2183"
CID→(PubChem)→INCHIKEY
INCHIKEY→(ChEMBL)→MOLECULE_CHEMBL_ID
MOLECULE_CHEMBL_ID→(ChEMBL)→ACTIVITY_ID
## [1] "CIDs: 3783 ; INCHIKEYs: 3781 ; pairs: 3783"
## [1] "INCHIKEYs: 3314 ; MOLECULE_CHEMBL_IDs: 3314 ; pairs: 3316"
## [1] "MOLECULE_CHEMBL_IDs: 2302 ; TARGET_CHEMBL_IDs: 3877 ; ACTIVITY_IDs: 127943 ; DOCUMENT_CHEMBL_IDs: 16959"
## [1] "CID2UNIPROT links: 27008 ; CIDs: 2112 ; UNIPROTs: 2521"
## [1] "study-drug-disease-target links: 1725873"
## [1] "studies: 25486 ; drugs: 1560 ; diseases: 1814 ; targets: 2323"
| nct_id | drug_name | cid | disease_term | doid | gene_symbol | uniprot | idgTDL |
|---|---|---|---|---|---|---|---|
| NCT00583778 | ipratropium | 657309 | asthma | DOID:2841 | SMN2 | Q16637 | Tbio |
| NCT02940990 | docetaxel | 36314 | pan | DOID:9810 | TACR2 | P21452 | Tchem |
| NCT00002524 | Trimethoprim | 5578 | lymphoma | DOID:0060058 | DHFR | P00374 | Tclin |
| NCT00975494 | Sildenafil | 135398744 | ischemia | DOID:326 | PDE10A | Q9Y233 | Tclin |
| NCT01726335 | Risperidone | 5073 | schizophrenia | DOID:5419 | SLC22A2 | O15244 | Tchem |
| NCT02121756 | Dipyridamole | 3108 | liver disease | DOID:409 | ABCC5 | O15440 | Tchem |
| NCT00079313 | Imatinib | 5291 | lymphoma | DOID:0060058 | TYK2 | P29597 | Tclin |
| NCT00788008 | propofol | 4943 | analgesia | DOID:0060145 | HPGD | P15428 | Tchem |
| NCT02076243 | paclitaxel | 36314 | NSCLC | DOID:3908 | TACR2 | P21452 | Tchem |
| NCT00032097 | gadolinium | 151071 | glioblastoma multiforme | DOID:3068 | EGFR | P00533 | Tclin |
| NCT03117751 | Vorinostat | 5311 | testicular disease | DOID:2519 | HDAC3 | O15379 | Tclin |
| NCT01935973 | Trametinib | 11707110 | endometrial cancer | DOID:1380 | FGR | P09769 | Tchem |
| idgTDL | N |
|---|---|
| Tchem | 707 |
| Tclin | 324 |
| Tbio | 93 |
| Tdark | 2 |
## [1] "Study references: 388031 ; PMIDs: 287758 ; studies: 61208"
ACTIVITY_ID→(ChEMBL)→DOCUMENT_CHEMBL_ID
DOCUMENT_CHEMBL_ID→(ChEMBL)→PUBMED_ID
## [1] "DOCUMENT_CHEMBL_IDs:: 16198 ; PMIDs: 15193"
Evidence weighted by:
Powered by Rmarkdown.